Attribute: Useful in Multiple Contexts


In [1]:
from IPython.display import display, Image, HTML
from talktools import website, nbviewer

Overview

Scientific computing and data science are complex activities that involve a wide range of contexts:

  1. Individual, interactive exploration
  2. Debugging, testing
  3. Production runs
  4. Parallel computing
  5. Collaboration
  6. Publication
  7. Presentation
  8. Teaching/Learning

There are a large number of software tools that we use across these different contexts:

Python Ruby Perl C C++ Fortran Numba Cython MPI Hadoop Excel LaTeX Powerpoint Word Keynote Vim Emacs Make JavaScript Matlab Mathematica

This places a massive cognitive burden on users. This burden has nothing to do with the challenging technical problems users are trying to solve. This burden pulls them away from solving their actual problems.

We are working really hard to make sure that IPython is useful in the following contexts.

Interactive exploration

First and foremost, IPython is an interactive environment for writing and running code. We provide features to make this as pleasant as possible.

Tab completion:


In [2]:
import math

In [21]:
math.


Out[21]:
<module 'math' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/math.so'>

Interactive help:


In [4]:
math.cos?

Inline plotting:


In [5]:
%pylab inline


Populating the interactive namespace from numpy and matplotlib

In [6]:
plot(rand(50))


Out[6]:
[<matplotlib.lines.Line2D at 0x103ea60d0>]

Seamless access to the system shell:


In [7]:
ls


Close to Data.ipynb                README.md                          ipythonteam/
IPython Project.ipynb              Useful in Multiple Contexts.ipynb  load_style.py
IPython Project.pdf                data/                              load_style.pyc
IPython Project.tex                frontmatter.py                     talk.css
IPython Project_files/             frontmatter.pyc                    talktools.py
LICENSE                            images/                            talktools.pyc
Multi-lingual.ipynb                ipythonproject.py
Open Software Engineering.ipynb    ipythonproject.pyc

IPython was used for interactive, exploratory data science at the first White House Hackathon.


In [8]:
from IPython.display import YouTubeVideo
YouTubeVideo('sjfsUzECqK0')


Out[8]:

Publishing

IPython Notebook contain everything related to a computation and its results: code, narrative text, equations, plots, images, videos, HTML, JavaScript. We have developed tools for "publishing" these Notebook documents in different contexts:

  • nbconvert: converts Notebooks to different static formats: HTML, LaTeX, PDF, slideshows.
  • nbviewer: provides a static HTML view of any Notebook on the internet.

Let's generate a static PDF of this talk's introduction:


In [9]:
!ipython nbconvert --to latex --post pdf "IPython Project.ipynb"


[NbConvertApp] Using existing profile dir: u'/Users/bgranger/.ipython/profile_default'
[NbConvertApp] Converting notebook IPython Project.ipynb to latex
[NbConvertApp] Support files will be in IPython Project_files/
[NbConvertApp] Loaded template latex_article.tplx
[NbConvertApp] Writing 17532 bytes to IPython Project.tex
[NbConvertApp] Building PDF
[NbConvertApp] Running pdflatex 3 times: ['pdflatex', 'IPython Project.tex']
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
[NbConvertApp] Running bibtex 1 time: ['bibtex', 'IPython Project']
[NbConvertApp] WARNING | bibtex had problems, most likely because there were no citations
[NbConvertApp] Removing temporary LaTeX files
[NbConvertApp] PDF successfully created

Here is the nbviewer website:


In [10]:
website('nbviewer.ipython.org')


Out[10]:

We also maintain a gallery of interesting Notebooks that contains a curated list of IPython Notebooks on various topics.

Examples of Notebook based publication

Cam Davidson-Pilon has written an entire book on Bayesian Statistics as a set of IPython Notebook that are hosted on GitHub and viewed on http://nbviewer.ipython.org.


In [11]:
website('http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/')


Out[11]:

Matthew Russell has written an O'Reilly published book that includes IPython Notebooks for all examples.


In [12]:
website('http://shop.oreilly.com/product/0636920030195.do')


Out[12]:

Jose Unpingco has written a series of blog posts on Signal Processing using the IPython Notebook. These blog posts were the basis of a full length book Python for Signal Processing, Springer (2013).


In [13]:
website('http://python-for-signal-processing.blogspot.com/')


Out[13]:

Jake Vanderplas and others publish technical blogs that are authored as IPython Notebooks.


In [14]:
website('http://jakevdp.github.io/blog/2013/08/28/understanding-the-fft/')


Out[14]:

People are now using nbviewer with Twitter to speak about a wide range of technical work.


In [15]:
Image('images/twitter_post.png')


Out[15]:

Presenting

The IPython Notebook, is being used extensively (PyCon, PyData, Strata, Supercomputing, SciPy, SIAM) for presentations on technical topics across a wide range of fields. The Notebook has a cell toolbar for adding slide related metadata to cells. However, we are working on improving this usage case:

  1. Prototype live presentation mode.
  2. Use nbconvert to export to reveal.js based slideshow.

Parallel Computing

On November 14, 2013, IBM announced that it was making its Jeopardy playing supercomputer, Watson, available to developers on the internet as a service. In the summer of 2013, researchers on the Watson team revealed that they were using the IPython Notebook and IPython.parallel to improve Watson's performance and capabilities.

Before:

  • 8000 lines of Java code
  • 2 minutes per analysis run.

After:

  • 220 lines of Python code.
  • 2 seconds per analysis run.

Teaching

The IPython Notebook is being used for lecture materials and student work in a number of university and high school courses on scientific computing and data science. Most of these courses are being developed publicly on GitHub. Here is a short list:


In [16]:
%%file courses.csv
"Course","University","Instructor"
"Data Science (CS 109)","Harvard University","Pfister and Blitzstein"
"Practical Data Science","NYU","Josh Attenberg"
"Scientific Computing (ASTR 599)","University of Washington","Jake Vanderplas"
"Working with Open Data","UC Berkeley","Raymond Yee"
"Computational Physics","Cal Poly","Jennifer Klay"


Writing courses.csv

In [17]:
import pandas

In [18]:
df = pandas.read_csv('courses.csv'); df


Out[18]:
Course University Instructor
0 Data Science (CS 109) Harvard University Pfister and Blitzstein
1 Practical Data Science NYU Josh Attenberg
2 Scientific Computing (ASTR 599) University of Washington Jake Vanderplas
3 Working with Open Data UC Berkeley Raymond Yee
4 Computational Physics Cal Poly Jennifer Klay

Styling


In [19]:
%load_ext load_style

In [20]:
%load_style talk.css



In [ ]: